Autoencoder With Emotion Embedding for Speech Emotion Recognition

نویسندگان

چکیده

An important part of the human-computer interaction process is speech emotion recognition (SER), which has been receiving more attention in recent years. However, although a wide diversity methods proposed SER, these approaches still cannot improve performance. A key issue low performance SER system how to effectively extract emotion-oriented features. In this paper, we propose novel algorithm, an autoencoder with embedding, deep Unlike many previous works, instance normalization, common technique style transfer field, introduced into our model rather than batch normalization. Furthermore, embedding path method can lead efficiently learn priori knowledge from label. It enable distinguish features are most related human emotion. We concatenate latent representation learned by and acoustic obtained openSMILE toolkit. Finally, concatenated feature vector utilized for classification. To generalization method, simple data augmentation approach applied. Two publicly available highly popular databases, IEMOCAP EMODB, chosen evaluate method. Experimental results demonstrate that achieves significant improvement compared other systems.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using denoising autoencoder for emotion recognition

In this paper, we propose to use the denoising autoencoder to generate robust feature representations for emotion recognition. In our method, the input of the denoising autoencoder is the normalized static feature set (state-of-the-art features for emotion recognition). This input is mapped to two hidden representations: one is to capture the neutral information from the input, and the other on...

متن کامل

Emotion Recognition from Speech

This paper proposes the classification of emotions based on spectral features using the Gaussian Mixture Model as the classifier. The performance of the Gaussian Mixture Model has been evaluated for two types of databases – acted and reallife speech corpuses. The model has also been evaluated for the variation in its performance based on the speaker, gender of the speaker and the number of the ...

متن کامل

Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space

In this work, an emotion-pair based framework is proposed for speech emotion recognition, which constructs more discriminative feature subspaces for every two different emotions (emotion-pair) to generate more precise emotion bi-classification results. Furthermore, it is found that in the dimensional emotion space, the distances between some of the archetypal emotions are closer than the others...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2021

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2021.3069818